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Abstract 

We study asymptotic behavior of Monte Carlo method. Local consis- 
tency is one of an ideal property of Monte Carlo method. However, it may 
fail to hold local consistency for several reason. In fact, in practice, it is 
more important to study such a non-ideal behavior. We call local degen- 
eracy for one of a non-ideal behavior of Monte Carlo methods. We show 
some equivalent conditions for local degeneracy. As an application we 
study a Gibbs sampler (data augmentation) for cumulative logit model 
with or without marginal augmentation. It is well known that natural 
Gibbs sampler does not work well for this model. In a sense of local con- 
sistency and degeneracy, marginal augmentation is shown to improve the 
asymptotic property. However, when the number of categories is large, 
both methods are not locally consistent. 

1 Introduction 

This paper investigates a poor behavior of Markov chain Monte Carlo (MCMC) 
method. There have a vast literature related to the sufficient conditions for 
a good behavior, ergodicity: see reviews [16] and [15] and textbooks such as 
[14] and [13]. The Markov probability transition kernel of MCMC is Harris 
recurrent under fairly general assumptions. Moreover, it is sometimes geomet- 
rically ergodic. In practice, however the performance can be bad even if it is 
geometrically ergodic. 

In [5] we introduced a framework for the analysis of Monte Carlo procedure. 
Monte Carlo procedure is defined as a pair A4 = (M, e) of underlying probability 
structure M and a sequence of "estimator" e = (e m ; m — 1,2,...) for the target 
probability distribution. Using the framework we constructed consistency, which 
is a good behavior of Monte Carlo procedure. Current study, we apply the 
framework to study bad behavior. 

There are several bad behaviors for Monte Carlo procedure. Two extreme 
cases are, a) the sequence generated by Monte Carlo procedure has very poor 
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mixing property, and b) the sequence goes out to infinity. We call a) degeneracy 
and the paper is devoted to the study of the property. We focus on a) in this 
paper. For b), see Examples 3.2 and 3.3 of [5]. 

1.1 Degeneracy 

To describe degeneracy more precisely, we consider a numerical simulation for 
the following simple model: 

p(y = i\e,x) = $(0x), p(y = o\e,x) = i - p(y = i\e,x) 

where x is a It-valued explanatory variable and 9 is a parameter and $ is a 
cumulative distribution function of the normal distribution (See Section 1.2.2). 
Explanatory variable x is generated from uniformly distribution on (0, 1). We 
define two Gibbs sampler A4 n and Af n . 

1.1.1 Gibbs sampler M. n 

Assume we have observation y n — (y , . . . , y n ) and x n = (x , . . . , x n ) and 6 prior 
is set to be standard normal distribution. There are two ways for construction 
of the Gibbs sampler. One way is to prepare latent variable zcR from N(0, 1) 
and set 

_ f 1 if z < 6x 

V ~ \ if z > 9x ' 

Then Gibbs sampler is generated by iterating the following procedure: For given 
9 and for i = 1, . . . , n, generate z % from N(0, 1) truncated to (— oo, 9x l ] if y % — 1 
and truncated to (0x l , oo) if y l = 0. Then update 6 from N(0, 1) truncated to 
an interval 

[max — . min — ) . 

yi = l x % y 4 =0 X % 

Write Ai n for this Gibbs sampler. 

1.1.2 Gibbs sampler M n 

Similarly, we define another Gibbs sampler by taking latent variable z l from 
N(—9x l , 1), which is a normal distribution with mean —6x l with variance 1 and 
set 

f 1 if z < 

V ~ \ ifz>0 ' 

Then Gibbs sampler is generated by iterating the following procedure: For given 
9 and for i = 1, . . . , n, generate z % from N(—9x l , 1) truncated to (— oo, 0] if y % — 1 
and truncated to (0, oo) if y 1 = 0. Then update 9 from normal distribution with 
mean \x and variance a 2 defined by 

„_ ELi*g i 
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Write N n for this Gibbs sampler. 

We obtain two Gibbs samplers M n and J\f n . Although the constructions 
are similar and both of which have geometric ergodicity, the performances are 
different. Figure 1 is a trajectory of the Gibbs sampler sequence 
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Figure 1: Trajectory of the Gibbs samplers for sample size n — 100 (upper) and 
n = 1000 (lower). Solid line is for A4 n and dashed lines is for Af n . 



e(O),...,0(m-l) 

for iteration m = 200 and sample size n = 100 (upper) and n = 1000 (lower). 
For each sample size, by ergodicity, empirical distributions tend to the same 
posterior distribution of 8 for two Gibbs samplers as m — > oo. However the 
solid line M n has poor mixing property than J\f n . Therefore it may produces a 
poor estimation of the posterior distribution. 

The difference becomes larger when the sample size n = 1000 in Figure 
1 (lower). The trajectory from M. n (solid line) is almost constant. For both 
simulations, the true value is 8o = 2 and the initial value 8(0) is set to 1.5. 

Even though A4 n has geometric ergodicity, it has poor mixing property. We 
would like to say {Ai n ', n = 1,2,.. .} is degenerate. Later we will prove that it 
is degenerate after certain localization. On the other hand {N n \ n = 1, 2, . . .} is 
consistent under the same scaling by Theorem 6.4 of [5]. 

We study such a poor behavior, degeneracy, in this paper. The analysis may 
seem to be just a formalization of obvious facts. However, sometimes degeneracy 
can not be directly visible and it produces non-intuitive results. In this paper, 
we obtain the following results for (Markov chain) Monte Carlo methods. 
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1. Degeneracy and local degeneracy of Monte Carlo procedure are defined 
and analyzed. 

2. As an example, we studied cumulative link model. Marginal augmentation 
method is known to work at least as good as the original Gibbs sampler. 
We show that in some cases, marginal augmentation really improves the 
asymptotic property, and the rest of the cases, surprisingly, we show that 
both of the MCMC methods does not have local consistency. 

The paper is organized as follows. Section 2 is devoted to a study of de- 
generacy of Monte Carlo procedure in general. In Subsection 2.1 we briefly 
review consistency of Monte Carlo procedure, and after that we define degen- 
eracy and apply it to Markov chain Monte Carlo procedure. Next we examine 
the degeneracy for an example, cumulative link model. We prepare Section 3 
for the asymptotic property of cumulative link model itself. There is no Monte 
Carlo procedure in this section. In Section 4 we apply degeneracy to the model 
and obtain asymptotic properties of Markov chain Monte Carlo methods for 
cumulative link model. 

1.2 Notation 

Let N = {1, 2, . . . , } and N = {0, 1, 2, . . .}. We write the integer part ofieR 
by [a;]. 

1.2.1 Probability measure, Transition kernel 

For measurable spaces (E,£), the space of probability measures on (E,£) is 
denoted by V(E). 

For two measurable space (E,£) and (F, J-), a probability transition kernel 
K from E to F is a map K : E x T — >• [0, 1] such that 

1. K(x, •) is a probability measure on (F,F) for x E E. 

2. K(-, A) is ^-measurable for any A e T . 

We may write K(dy\x) instead of K(x, dy). If K (x, •) is a-finite measure instead 
of probability measure, we call K a transition kernel. 

1.2.2 Normal distribution 

Write <f>(x) = exp(— x 2 /2) / y/2ir for a probability distribution function of N(0, 1) 
and write $>(x) = J_ 4>{y)dy. For /i 6 R p and p x p-positive definite matrix E, 
a function <p(x;fj,,H) = exp(— x T E _1 o;/2)/(27rdet(E)) 1 / 2 is a probability distri- 
bution function of N(p, S) = N p ([x, S) where det(E) is a determinant of E and 
x T is a transpose of a vector x e R p . 
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1.2.3 Central value 



For a probability measure /i on R, a central value is a point x G R satisfying 



Element of R p is denoted by x = (x 1 , . . . , x p ) T . For a probability measure 
fi on R p , let n l {A) be ^ R lA(x l )fi(dx) for A G B(R). For /i, we call x = 
(a; 1 , a; 2 , . . . ,x p ) T G R p central value if each x 1 is a central value of fx 1 . There 
is no practical reason for the use of the central value for Markov chain Monte 
Carlo procedure as is used in this paper. We use it because of its existence and 
continuity. That is, (a) for the posterior distribution P n {d6\x n ), its mean does 
not always exist but the central value does and moreover, it is unique and (b) 
if fi n —> /i, then the central value of fi n tends to that of /x. See [4]. 

2 Degeneracy of Markov chain Monte Carlo pro- 
cedure 

In this section, we introduce a notion of degeneracy and local degeneracy of 
Monte Carlo procedure. We use the same framework as [5] to describe local 
degeneracy. In their approach, Monte Carlo procedure is considered to be a 
pair of random probability measure and transition kernels. We briefly review 
their framework in Subsection 2.1. 

2.1 Consistency and local consistency 

In this subsection, we prepare a quick review of the framework of [5]. Let (S, S) 
be a measurable space. Let (S N °,S N °) be a countable product of (S,S). Each 
element of S N ° is denoted by Soo = (s(0), s(l), . . .) and its first m subsequence 
is denoted by s m = (s(0), . . . ,s(m — 1)). Let (0, d) be a complete separable 
metric space equipped with Borcl er-algebra S. We define non-random Monte 
Carlo procedure. The meaning of "non-random" will be clear after we define 
"random" Mote Carlo procedure in Definition 2.4 and standard Gibbs sampler 
in Definition 2.9. 

Definition 2.1 (Non-random Monte Carlo procedure). A pair M = (M,e) is 
said to be non-random Monte Carlo procedure on (S, 0) where M is a probability 
measure on 5* N ° and e — (e m ; m = 1, 2, . . .) is a sequence of probability transition 
kernels e m from S m to 0. 

A simplest example of non-random Monte Carlo procedure is a non-random 
crude Monte Carlo procedure. 

Example 2.2 (Crude Monte Carlo). If we want to calculate an integral J Q f(6)H(d8) 
for probability measure H and measurable function f , one approach is to gen- 
erate i.i.d. sequence 8(0), 6(1), . . . from II and calculate m _1 Y^hLq 1 /(^(*))- ^ n 
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this case S = Q and we write O rn and 6^ instead of s m and Soo . This sim- 
ple Monte Carlo method is sometimes called a crude Monte Carlo method. We 
can describe it as a non-random Monte Carlo procedure. Let M be a countable 
product of a probability measure n on 9 (that is, M = 11®™° ) and e m be 



1 tn—1 
m L — ' 



i=Q 

where Sg is a Dirac measure. Then f & f(0)e m (O m ,dO) = to^X^q /(^(*))- 
We caZZ (M, e) a crude Monte Carlo procedure. 

Example 2.3 (Accept-Reject method). Accept-reject method generate i.i.d. se- 
quence from II on 6 from another probability measure Q. Assume that II is 
absolutely continuous with respect to Q and for some M < oo, 

r{6) :=^{6)<M (0 e 9). 

Generate i.i.d. sequence 0(0), 0(1), . . . from Q and u(0) , u(X) , . . . from the uni- 
form distribution U[0, 1]. Then accept-reject method approximate H by 

Vs m iKO^-Mfl(O)) , (2 . 2) 

We can describe it as a non-random Monte Carlo procedure. Let M be a 
countable product of a probability measure Q ®U on S := X [0, 1] (i/iai is, 
M = {Q®U)® Na ) ande m (s m ,-) be as (2.2) where s m = (s(0),s(l), . . . ,s(m-l)) 
and s(£) = (9(i),u(i)) e 5. We caZZ (M, e) accept-reject procedure for e — 
(e m ;m = 1,2,.. .). 

Now we consider a random Monte Carlo procedure. Let (X,X,P) be a 
probability space. 

Definition 2.4 (Monte Carlo procedure). A pair M. — (M,e) is said to be 
Monte Carlo procedure defined on (X,X,P) on (S,Q) where M is a probability 
transition kenel from X to S* N ° , that is 

1. M(x, •) is a probability measure on (S ,S °). 

2. M(', Aoo) is X -measurable for any A^ 6 <S N °. 

and e — (e m ;m = 1,2,...) is a sequence of a probability transition kernel e m 
from X x S m to 9. 

We call M. stationary if M (x, •) is (strictly) stationary for P-a.s. x. Sta- 
tionarity plays an important role for the asymptotic behavior of Monte Carlo 
procedure. 

Markov chain Monte Carlo procedure is a class of Monte Carlo procedure. 
Let /i be a probability transition kernel from X to S and K be a probability 
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transition kernel from X x S to S. We call a probability transition kernelM 
from X to S N ° random Markov measure generated by (fJ,,K) if M(x, •) is a 
Markov measure having initial probability distribution fi(x, •) and a probability 
transition kernel K(x, ■, •), that is, 

M(x, d Soc ) = n(x, ds(0))K(x, s(0),ds(l))K(x, s(l),ds(2)) 

Definition 2.5 (Markov chain Monte Carlo procedure). If Monte Carlo pro- 
cedure M = (M, e) has M as a random Markov measure, we call A4 Markov 
chain Monte Carlo procedure defined on (X, X, P) on (S, 0) . 

As a measure of efficiency we define consistency and local consistency for a 
sequence of Monte Carlo procedures. Let (X n ,X n ,P n ) be a probability space, 
(S n ,S n ) be a measurable space and (0„,d™) be a complete separable metric 
space equipped with Borel a-algebra S„ for n = 1,2,.... Let A4 n = (M„, e„) 
where e„ = (e„ iTO ;m = 1,2,...) be a Monte Carlo procedure on (X n ,X n ,P n ) 
on (S n , 0„) for n = 1,2, . . .. 

The purpose of the Monte Carlo procedure is to approximate a sequence of 
probability transition kernels (II„;n = 1,2...) from X n to Q n for each n = 
1,2, Let w n be a bounded Lipshitz metric on (0„, 5 n ) defined by d n . Then 

w n{&n,m{Sm7 n n (x n , •)) 

measures a loss of the approximation of Il n (x n , •) by e n , m (s m , •). 

W m (Mn{x„, -),n„(x„, •)) := / w m (e nim (s m ,-),Il n (x n ,-))M n (x n ,ds 00 ) 

Js m es™° 

is an average loss with respect to Soo . We define a risk of the use of the Monte 
Carlo procedure M n up to m for an approximation of IT n by 

R m (M n ,n n ) := / W m (M n (x n ,-),U n (x n ,-))P n (dx n ). 
Jx n ex n 

Definition 2.6 (Consistency). A sequence of Monte Carlo procedure [M. n ;n = 
1,2,.. .) is said to be consistent to (II„; n = 1,2,...) if i? m „(A / i„, Il n ) — ► for 
any m n — > oo. 

When Tl n (x n , •) tends to a point mass, the above consistency does not provide 
good information. In such a case, we consider local consistency. Let 0„ = 
C R p and a centering 9 n : X n — > be measurable. We consider a scaling 
e^n^ 2 {9-6 n ). Let 

f IL* n (x n , A) := / l A (n^ 2 (e - 6 n ))Il n (x n ,d6), 

I e n,m( x n,s m ,A) := J l^(n 1 / 2 (0 - O n ))e„. m (x n , s m , dff) 

Let M* n = (M„, e* ) for e* = (e* jTO ; m = 1, 2, . . .). 

Definition 2.7 (Local Consistency). // (.M*;n = 1,2,...) is consistent to 
(II* ; n = 1,2...), (Al„; n = 1, 2, . . .) is said to 6e /oca/ consistent to (II n ; n = 
1,2,...). 
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Remark 2.8. This scaling is just one example. For other cases, such as mixture 
model considered in [6], 9 i-> n 1 ' 2 e^ 1 (6 — 9 n ) where 9 n = for some e„ — > 0. 
Moreover, the scaling factor (n 1 ! 2 orn 1 ! 2 ^ 1 in the above example) may depend 
on the observation. However, for the current paper, it is sufficient to consider 
the above scaling 8 n 1/,2 (# — 9 n ). 

In the end of the subsection, we briefly review the definition of standard 
Gibbs sampler and extend it to non-i.i.d. structure. Let (6,d) be a complete 
separable metric space equipped with Borel cr-algebra S. Let (X n ,X n ) and 
(Y n ,y n ) be measurable spaces. Assume the existence of probability transition 
kernels 

P n {d9\x n ,y n ), P n (dy n \x n ,9), P n (d9\x n ) 
with probability measures P n (dx n , dy n , d9) . Assume we have relations 

P n (dx n , dy n , d8) = P n (d8\x n ,y n )P n (dx n ,dy n ) = P n (dy n \x n , 9)P n (dx n , dO) 

where P n {dx n ,d9) and P n (dx n , dy n ) are marginal distributions of P n (dx n , dy n , d9). 
Moreover, we assume 

P n {dx n ,d9) = P n (d9\x n )P n (dx n ) 
where P n (dx n ) is also a marginal distribution. Let 

IL n (x n ,ds) = P n (dy n \x n ,9)P n (d9\x n ), K n (x n ,s,ds*) = P n {dy* n \x n ,9)P n (d9*\x n ,y* n ) 
for s = (y n , 9) and s* = (y*,0*). Let e„ = (e„ >m ; m = 1, 2, . . .) be 

m—l 

e n , m (x n , 8 m ,A) = mT 1 2J 1 A(9(i)) {A e S) 

i=0 

where s m — (s(0), . . . , s(m — 1)) and s(i) = (y(i), 9(i)). 

Definition 2.9 (Sequence of standard Gibbs sampler). Set M n as a random 
Markov measure generated by (Tl n ,K n ). Then (M n = (M„,e„);n = 1,2,...) 
is called a sequence of standard Gibbs sampler defined on (X n , X n , P n ) on (Y n x 

e,e). 

Using the abbreviation defined in the next subsection, we can write Ai n = 
(M n ,6). 

2.2 Abbreviations 

The framework described in the previous subsection is useful as a formal def- 
inition for Monte Carlo procedures. However, it is sometimes inconvenient to 
write down e = (e m ; m = 1, 2, . . .) for every time. In this paper we use two ab- 
breviations to denote Monte Carlo procedure (M, e). First one is abbreviation 
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for a class of empirical distribution. All examples of e in the rest of the aper 
has the following form 



e m {x,s m ,A) = — V l A (E(s(i))) 

t=0 

where s m = (s(0), . . . , s(m — 1)) and E : S -> 0. Then we write (M,E) 
for (M, e). We also use a notation (M,E(s)). For example, if s = (?/,#) and 
£(s) = 9ee, then (M, £) is denoted by (M, 6»). If 5 = 9 and E is the identity 
map, we write (M, id). 

The second abbreviation is about transformation. Let F : — > 5* where 
is a Polish space. For a probability transition kernel /i(x, d#), we define 

V F (x,A) = f l A (F(6))n(x,d6). 
Je 

Similarly, we define 

e m( x i s m, A) = / l A (F(0))e m (x,s m ,d9) 
Je 

for e = (e m ; m = 1,2,.. .). Set e f = (e£; m = 1, 2, . . .) and M F = (M, e F ). 
Then M F is a Monte Carlo procedure defined on (X, X, P) on (5, ^) and we call 
M F a transform of M. For example, if M = (M, £), then M F = (M, EoF). 

Now we consider a localization of a transform Ai F . Let C R p and VP C R 9 
be open sets. For t e $, wc define a scaling n 1//2 (r — f„) where f„ = F(8 n ). 
We write = (M„, e£*) and n£* for the scaling of M£ = (M„, e£) and Il£ 

with respectively, that is, 

1 e£* m (x„, « m , A) := / l A (n 1 /2(F(0) - F(^))) e „, m (a;„, s ro , d9) [ ' ' 

where e F * = (e F * m ;m = 1,2,.. .). We say (M F ;n = 1,2,...) is locally con- 
sistent to (Jl F ;n = 1,2,...) if (M F *;n = 1,2,...) is consistent to (U F *;n = 
1,2,...). The following lemma states that (M F ;n = 1,2,...) is locally consis- 
tent if (A4 n ; n = 1, 2, . . .) is. 

Lemma 2.10. Let F : — > ^ be C 1 map except a compact set N of 0. 
Assume P n {0 n € N) — > and both the law of 9 n and J x P n {dxn)T\* n {x ni ■) 
are tight. Then (Ai F ;n = 1,2,...) is local consistent to (H F ;n = 1,2,...) if 
{M. n ; n = 1, 2, . . .) is local consistent to (LI„; n = 1,2,...). 

Proof. Let B r {u) = {«£ R p ; d(u, v) < r}. Fix m n — > oo and r n — > such that 
r n n x l 2 — > oo. 

We first remark that for any e > 0, there exists <5 > such that for TV 5 = 
{x e 0; y) < (5, y e A}, there exists a compact set A C (A 5 ) such that 

lim sup P„((9„ e A c ) < e. 
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Second we consider 




-4) 



Then by assumption, both H n (B rri (0 n ) c ) and e n . mn (B rri (O n ) c ) tends to 0. 
By the differentiability of F, we have 




and hence we can replace n 1 / 2 (F(6») - F(6 n )) of (2.3) by dF(9 n ) T n^ 2 {6 - 9 n ) 
if §„ € K. 



w m AM^(x n ),n^(x n ))<o P ji) + \—(e n )\w mn (MU^n),K(xn)) 



which means local consistency of {M F ; n = 1, 2, . . .) to (IL^; n = 1, 2, . . .). □ 

Roughly speaking, this lemma says that, if M. n is "equivalent" to TVjf for 
some F and (A/" n , n = 1,2,...) is locally consistent, then (M. n , n — 1,2,...) is 
also locally consistent. 

We define minimal representation and equivalence of Monte Carlo procedure 



Definition 2.11 (Minimal representation, equivalence). Let (M,E) be a Monte 
Carlo procedure for E : S — > 9. For a realization Srx, = (s(0), s(l), . . .) of 
M(x,-), we write M E (x,-), for the law of (E(s(0)), E(s(l)), . . .), that is, 



Then we call (M E ,id) a minimal representation of A4 = (M,E). If two Monte 
Carlo procedures A4,Af have the same minimal representation, we call M,Af 
equivalent. 

Note that even if M. is Markov chain Monte Carlo procedure, a minimal 
representation may lose Markov property of original Monte Carlo procedure. 

2.3 Degeneracy and local degeneracy 

We define degeneracy of Monte Carlo procedure. Let 



Using this replacement, we have 



(M, E). 




l A ((E(s(0)),E(s(l)), . . .))M{x, d Soo ). 




and 
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Note that for bounded Lipshitz metric w for probability measures on a metric 
space (D, d), 

w{fi,S x ) = J d(x,y)fj,(dy) (2.4) 
where S x is a Dirac measure. 

Definition 2.12. A sequence of Monte Carlo procedure (M n ;n = 1,2, . . .) on 
(X n , X n , P n ) on (S n , 0) is said to be degenerate if R' m (A4 n ) for any to e N. 
If(M.n', n = 1, 2, . . .) is degenerate, we call (M. n ; n = 1,2,...) locally degenerate. 

Remark 2.13. In fact, as a measure of poor behavior, degeneracy is sometimes 
too wide. Roughly speaking, among degenerate Monte Carlo procedures, there are 
relatively good one and bad one. Even if Monte Carlo procedure is degenerate, 
sometimes it tends to H n in a slower rate. This convergence property is called a 
weak consistency by [6] although the terminology in that paper is slightly different 
from the current one. We can distinguish degenerate Monte Carlo procedures 
by the rate. 

The following is an example for non-random Markov chain Monte Carlo 
procedure. Let B r (x) = {y G R p ; \x — y\ < r}. 

Example 2.14. Let 0„ = S n = Q = R p . Let A4 n = (M n , id) be a non-random 
Markov chain Monte Carlo procedure on where M n is generated by (^i n ,K n ). 
Let R n be a probability transition kernel from to itself and A n : — > (0, 1) 
(open interval) be a measurable map. Assume K n (x,dy) = A n (x)R n (x, dy) + 
(1 - A n (x))6 x (dy). 

We can show that if (/i„; n = 1, 2, . . .) is tight and (a) if sup^g^ A n {x) — > 
for any compact set K , or (b) if sup xeK R n (x, B e {x) c ) — > for any compact set 
K and e > 0, then {M n ] Ti = 1, 2, . . .) is degenerate. 

To show (a), fix m £ N and e > 0. By assumption, there exists a compact 
set K such that 

limsup/UnX-fi^) < e, sup A n (x) —> 0. 

Let E m :— {^^(O) = 9(1) = ■ ■ ■ = 9(m — 1)} which is an event without any 
jump in first m steps. Then 

limsupA/ n (i4) < 1 - liminf / (1 - A n (x)) m fj,(dx) < 1 - e. 

n->oo n-s-oo J xeK 

On the event E m , w(e m {Q m ),ei(0i)) = where e m (9 m ) = mT 1 YT= ^ &0(i) ■ 
Hence limsup,^^ R' m (Ai n ) < e which proves the first claim. 

To show (b), as above, fix m € N and e > 0. Let E m := {O^; d(9(i), 9(i + 
1)) < e/2m (i = 0, . . . , to — 2)} be the event which does not move far from initial 
point 9(0) in first to steps. By assumption, there exists a compact set K such 
that 

limsup/in^) < e/2, sup R n (x, B e / 2m (x)) ->• 
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where K e = {x £ 9; 3y £ K, s.t. d(x, y) < e}. Then limsup„^. (X) M(E^) < e/2 
and on i/ie event E m , 

m—l in— 2 

w(e m (^oo), ei(^)) < m- 1 £ d(0(O), < ^ W), *(< + 1)) < e A 

i=l i=0 

Hence limsup^^^ R' m {Ai n ) < e which proves the second claim. 

A sequence of consistent Monte Carlo procedures can be degenerate. How- 
ever it is very spacial case. In particular, we have the following proposition. We 
call a sequence of probability transition kernel H n from X n to 0„ degenerate if 
there exists a measurable map 9 n : X n — > 0„ such that 



lim 



w n (U n (x n , -),S§ , ,)P n {dx n ) = 0. 



If its localization II* is degenerate, we call H n locally degenerate. 

Proposition 2.15. Let S n — Q n and let (A4 n — (M„,id);n = 1,2,...) be 

consistent to (IL n ; n — 1,2,...) and also degenerate. Then (II„; n = 1, 2, . . .) is 
degenerate. 

Proof. By degeneracy, there exists m n —> oo such that R' m (M. n ) tends to 0. 
Then 

/ / Wn(tt n (x n ,-),6 s{0 - ) )M n (x n ,ds ocl )P n (dx n ) -> (2.5) 

since the left hand side is bounded by R mn (Ai n ,H n ) + R mn (-Mn,^ s (o)) where 
both two terms tend to 0. Write marginal distribution of M n (x n) •) on s(0) by 
fJ"n{x n ,-) 1 that is, M n (x ni A X S n X S n X • • • ) = /i„(a; n ,v4). Then the above 
convergence can be rewritten by 

w n (IL n (x n , ■),5 s )fi n (x n ,ds)P n (dx n ) -> 0. 

i„ex„ Jses„ 

By triangular inequality, w n (S s , S t ) is bounded by w n (Tl n (x n , - ),6 S ) plus w„(II„(x r , 
Hence we have 

/ / w n (S s ,St)^n(x n ,ds)fi n (x n ,dt)P n (dx n ) -> 0. 

For each i„, we can find 9 n (x n ) to be 



w ( S s,Sg , x Junix^ds) < / w n (S s ,S t )fx n (x n ,ds)fx n (x n ,dt) 
ses Js,tes n 

and measurable. Therefore we have J x J seS w(S s ,Sg ^ x ~.)fi n {x n ,ds)P n {dx n ) — > 

0. Hence by triangular inequality, we can replace s(0) in (2.5) by 6 n {x n ) which 
completes the proof. □ 
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For stationary case, the following proposition is useful to prove degeneracy. 

Proposition 2.16. Let F n : S n -> 6„. Let (M n = (M n , F„); n= 1,2,.. .) be 

stationary. Then (M. n \ n — 1,2, . . .) is degenerate if and only if 



d n (F n (s(0)), F n {s{l)))M n {x n , d Soo )P n {dx n ) ->■ 0. (2.6) 

Proof. Since w n (S Flt ^(q))iS F {s(i))) = d n (F n (s(0j), F n (s(l))), the sufficiency is 
obvious by applying m = 2 for the definition of degeneracy. On the other 
hand, if (2.6) holds, take E m := {s M ; d n (F n (s(i)), F n (s(i + 1))) < e/2 (i = 
0, . . . , m — 2)}. For fixed m € N, by stationarity, M n (x n , Ef n ) < mM n (x n , E{) 
and J x €X M n (x n ,El)P n {dx n ) — > by (2.6). By triangular inequality, on 



the event E m , d"(e n , m (s m ),e n ,i(si)) where e n>m (s m ) = m 1 J]"=o ^„(s(i)) 
bounded by 

m- 

1 £ d"(F n ( S (0)),F n ( S (i))) < £ d"(F n ( S (i)),F„( S (i + l))) <e. 



is 



m— 1 

— i 

m 



Hence lim sup n _ ) . 0O i?J n (A^„) < e which proves the claim. □ 

We consider local degeneracy of the sequence of standard Gibbs sampler 
defined in Section 6.1 of [5]. 

Proposition 2.17. A sequence of a standard Gibbs sampler (A4 n 'i n = 1,2,...) 
is degenerate if and only if there exists a measurable function 9 n : X n x7„->0 
such that 

w(P n (dO\x n ,y n ),SgjP n (dx n dy n ) -t 0. (2.7) 

Moreover, if O C R p , {-M. n ; n = 1, 2, . . .) is locally degenerate under the scaling 
9 H> n x / 2 {9 — 6 n (x n )) if and only if there exists f n : X n x Y n — » R p 

w(P*(d9\x n ,y n ),6f n )P n (dx n dy n ) -t 

where P*(d9\x n ,y n ) is the localization of P n (d9\x n ,y n ). 

Proof. Assume that (A4 n ; n = 1, 2, . . .) is degenerate. Then by Proposition 2.16 
and (2.4), 

f d(9(0),9(l))P n (d9(l)\x n ,y n )P n (dy n \x n ,9(0))P n (d9(0)\x n )P n {dx n ) 
Jx n ,e(o),y n .e(i) 

tends to 0. Then as in the proof of Proposition 2.15, there exists a measurable 
function 9 n : X n X Y n — > 6 such that 

),9{\))P n (d9(l)\x n ,y n )P n (dx n dy n ) 

Jx n ,y n ,9(l) 
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tends to 0. This proves (2.7) by (2.4). 

On the other hand, if (2.7) holds. Then by triangular inequality, 

«>(fyo), <W)) = d(6(0),6(l)) < d(6(0),6 n (x n ,y n )) + d(6 n (x n , y n ),6(l)) 

and by stationarity, the two terms on the right hand side have the same law. 
We have 

/ d(6{0),6 n (x n ,y n ))P n (d6{0)\x n ,y n ) = w(P n (d9\ 
Je(o) ny n,yn > 

and the integral of the right hand side by P n (dy n \x n )P(dx n ) tends to 0. Hence 




w(5 g ( ),5 e ( 1 ))M n (x n ,ds oo )P n (dx n ) ->• 



and degeneracy follows by Proposition 2.16. The proof for local degeneracy 
is the same replacing sequence 0(0), 0(1), . . . by F n (x n , 0(0)), F n {x n , 0(1)), . . . 
where F n (x n ,6) = n 1 ' 2 ^ - 9 n {x n )). □ 

Remark 2.18. By the proposition, it is easy to show that when standard Gibbs 
sampler is (locally) degenerate, then standard multi-step Gibbs sampler (not 
defined here) is also (locally) degenerate. This is another validation for the 
ordering of transition kernels of [10]. We could not establish similar relation 
for local consistency. 

For local degeneracy, we have the following. We omit the proof since it is 
the same for local consistency. 

Lemma 2.19. Let F : —> ^ be C 1 map except a compact set N o/0. Assume 
Pn{&n € N) and the law of 9 n is tight. Then (A4^;n = 1,2, . . .) is local 
degenerate if (Ai n ; n — 1,2, . . .) is local degenerate. 



3 Asymptotic properties for cumlative link model 

We consider a cumulative link model. Probability space (X,X,Px) is defined 
by X = W and X = B(R P ) with probability measure Px having a compact 
support. For c > 2, Y — {1,2, . . . , c} and y = 2 Y . Let F be a cumulative 
distribution function on R. When c > 3, a parameter = (a,j3) is constructed 
by a — (a 2 , . . . , a c ~ 1 ) such that < a 2 < ■ ■ ■ < a ' 1 and (3 <E R p . When c = 2, 
= (3. The model is 

x^P x (dx), P(y < j\x) = F(a j +(3 T x) (j = 1, 2, . . . , c) (3.1) 

with dummy parameters a — — oo, a 1 — and a c = +oo. The parameter space 
6 C R c 2 x RP is 

9 = {(a 2 , a 3 , . . . , a"- 1 , /3); < a 2 < ■ ■ ■ < a c -\f3 £ R p }. (3.2) 

This cumulative link model is useful for the analysis of ordered categorical 
data. See monographs such as [11] and [1]. The analysis for Gibbs sampler for 
the model will be studied in the next section. Before that, in this section, we 
show the regularity of the model. First we check quadratic mean differentiability. 
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3.1 Quadratic mean differentiability of the model 

We recall the definition of quadratic mean differentiability. Let (E,£) be a 
measurable space and {P(dx\9);9 € 6} be a parametric family on the space 
where O C R p be an open set. Assume the existence of a cr-finite measure v on 
(E,£) having P(dx\6) — p(x\9)v(dx) for a ^-measurable function p(x\9) for any 
fixed 9eS. 

Definition 3.1 (Quadratic mean differentiability). P(dx\9) is called quadratic 
mean differentiable at 9 £ if there exists a HP -valued £ -measurable function 
n(x\9) such that 

Wp{x\9 + h)- y/p(x\0) - h T r](x\9)\ 2 v(dx) = o(\h\ 2 ) (3.3) 

ix 

for any h € R p such that h 0. 

When P(dx\9) is quadratic mean differentiable at 9 € 0, a lot of properties 
such as local asymptotic normality of the likelihood ratio hold with minimal 
assumptions. See monographs such as [7] and [8]. 

Consider our model (3.1). The measurable space (E,£) is (X x Y, X ® y) 
for our model and cr-finite (in fact, finite) measure is defined by 

c 

u(dxdy) = P x (dx) ^ °~i{dy)- 
i=i 

For the choice of v, p{xy\9) satisfying P(dxdy\9) — p{xy\9)v{dxdy) is p(xy\9) = 
F(a y + j3 T x) — F(a v ~ 1 + (3 T x). We assume the following bit strong regularity 
condition. For x ~ Px(dx), write the law of £ := (1, x T ) T by P$. 

Assumption 3.2. 1. F(x) = J_ f(y)dy for a continuous strictly positive 
measurable function f . 

2. The support of P^ is compact, which is not included in any subspace of 
dimension strictly lower than p + 1. 

Proposition 3.3. Under Assumption 3.2, P{dxdy\9) is quadratic mean differ- 
entiable at any 9. 

Proof. Take R c+p_2 -valued measurable function rj(xy\9) to be 

^9) = gg^ttl (3.4) 
2^pJxW) 

which is well defined by Assumption 3.2 and set 1(9) = (Iij(9);i,j — 1,2,... ,p+ 
c-2)by 

1(9) = 4 / ri(xy\9)ri(xy\9) T v(dxdy). 
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By Theorem 12.2.2 of [8], if 1(9) is continuous, P(dxdy\9) is quadratic mean 
differentiable. Since v is a finite measure, it is sufficient to show the existence 
of M for any bounded open set A, 

sup |77(xy|0)| < M (x,y € X x Y). 

6EA 

Take an open set A to be its closure A C is compact. Take S > such that 

6 < {a 3 - a H ;«e A,j = 2,...,c- 1}. 
Then there exists Mi such that 

sup {M,|a J ' +0 T x\;6 £ A,x e supp P x } < Mi, 

3=1,2,.. .,c-l 

and for the choice of Mi, by continuity and positivity of /, there exists constants 
c*,c* € (0,oo) such that 

C </(x) <c* (ie [-Mi, Mi]). 

Then for i = 2, . . . , c — 1, 

F(a l + f3 T x)-F(a l - 1 +f3 T x)= / /(a?) > c*5. 

For i = l,c, choosing 5 > to be small enough, F(j3 T x) > F(— Mi) > c*<5 and 
1 — F(a c ^ 1 + j3 T x) > 1 — F(Mi) > c*<5 are satisfied. Hence the denominator of 
the right hand side of (3.4) is uniformly bounded for 6 E A. 
For the numerator of (3.4), we have 

d a ip(xy\9) = f{a l + /3 T x)l {y=l} - f(a z + f3 T x)l {y=i+1} 

and 

d pP (xy\e) = x(f( a y + /3 T x) - /(ay- 1 + p T x)). 

The absolute values of the above two terms are uniformly bounded by c* max{l, Mi} 
for 9 € A. Hence (3.4) is uniformly bounded and the claim follows by Theorem 
12.2.2 of [8] by the bounded convergence theorem. □ 

For x n = (x 1 ,. ..,x n ) and y n = (y 1 , . . .,y n ), set 

Z n (x n ,y n \e)^n-^±^ff§ (3.5) 

where rj(xy\6) is defined by (3.4). This function is called a normalized score 
function. By quadratic mean differentiability, the law of Z n (x n ,y n \9) tends 
to N(0, 1(9)). Moreover, if there exists a uniformly consistent, the posterior 
distribution tends to a normal distribution. In the next subsection, we show the 
existence of the test. 
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3.2 Uniformly consistent test 

We prepare notations for the large sample setting. Let 

(X n x Y n ,X n x y n ,P n {dx n dy n \6)) = (XxY 7 Xx y,P(dxdy\9))® n 

and write its element {x n ,y n ) where x n = (a; 1 , . . . , x n ), y n — (y 1 , . . . , y n ). For 
60 € Q and 9q G K G S, a sequence of measurable functions ^>„ : X„ x y„ — > [0,1] 
will be called uniformly consistent test for 9 against K c if both 

ipn{x n ,yn)Pn{dx n dy n \9 ), sup / 1 - il) n (x n ,y n )P n (dx n dy n \6) (3.6) 
J eeK" J 

tend to as n — > oo. We prove the existence of the uniformly consistent test. 
The following lemma states that it is sufficient to construct uniformly consistent 
test for smaller parameter spaces. 
For 6» = (a§, . . . , a^ -1 , /3 ), define 

B e ,i(*o) ~ = (a 2 , • ■ • , « c -\/3); (K " "T + I A) - /?| 2 ) 1/2 < e} 
and S e (6» ) := {6;\6 - 6 \ < e}. 

Lemma 3.4. Le£ c > 3. Suppose that for any e > and i = 2, . . . , c — 1, i/iere 
exists a uniformly consistent test {ip n ^n = 1,2,...) for 6 a against B eti (9 ) c . 
Then for any e > 0, there exists a uniformly consistent test (ip n ; n = 1, 2, . . .) 
for 6o against B £ (9 ) c ■ 

Proof. For notational simplicity, set 9 — and write B Cti and B e instead 
of B Ci i(9o) and B e (9 n ). We show that a sequence of test defined by ^>„ := 
max i=2j 3,... iC _i ^» n> i is a uniformly consistent test for 9o against S( c _2) e if (V>n,»i = 
1,2,.. .) is those for 9 against B t ,i. 
First observe that 

/ 1pnPn{dx n dy n \9 a ) < X] / TpnjPnidXndynlOo) -> 0. 
i=2 ^ 

On the other hand, by an obvious inequality 

l^ 2 -El«T + l/3| 2 <E(l«T + l/?l 2 ), 

i=2 i=2 

for any 9 G B^ c _ 2 ^ e , there exists z such that 6* G i?£ f . Therefore 



sup / (l-ip n )P n (dx n dy n \6) < max {sup (l-ip n ,i)P n (dx n dy n \9)} 
which tends to by assumption. Hence the claim follows. □ 
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Next we see that for c = 2 we can construct a uniformly consistent test. We 
use an argument used in Step 1 and 2 of Note 8.4.3 of [7]. If we can show the 
existence of test ip n ; X n x Y n — > [0, 1] for some n G N and a compact set K such 
that 

/ if}„Pn(dx n dy n \9 ) < - < inf / %p„P n {dx n dy n \9), (3.7) 

then the existence of uniformly consistent test for 9$ against £? e ((?o) c follows for 
any e > 0. This fact comes from quadratic mean differentiability of the model 
and continuity of 9 H> P(dxdy\9) in Proholov metric. 

Lemma 3.5. Under Assumption 3.2 with c = 2, there exists a uniformly con- 
sistent test for #o against B e (9o) c for any 9q G 0, e > 0. 

Proof. We take three steps to construct a uniformly consistent test. In the first 
step, we divide into p subsets (0,;; i = 1, . . . ,p). In the second step, we con- 
struct a uniformly consistent test ip n ^ for each parametric family {P{dxdy\9); 9 G 
0i}. In the last step we set ip n = maxi = i i 2,..., p ipn,i which will be a uniformly 
consistent test. 

For the first step, construct (i = 1, 2, . . . ,p). Choose (z^, i = 1, 2, . . . ,p) 
from supp Px to be span(zi; i = 1, 2, . . . ,p) = R p . Then there exists 6 > such 
that for any £ G R p having |£| = 1, there exists i G {1, 2, . . . ,p} such that 

By Zi G supp Px, Pi := §b s (z ) Px{dx) > and for |£| = 1, there exists i G 
{1,2, ... ,p} such that 

£ T a; > S (V.x G B a (2j)),ar C T a; < -6 (Vx G 

Therefore, if we take 

Qi := {9 ^ 0;9 T x > 5\9\ (Vz G B 5 {zi)),ox 9 T x < -5\6\ (Vx G B s ( Zi ))} 

then Uf =1 s: = R p \{0}. To be disjoint, set ©i = ©iU{0} and © 4 = ©AU^Qj 
for i = 1, . . . ,p. 

In the second step, we construct a uniformly consistent test for the paramet- 
ric family {P(dxdy\9); 9 G ©i} for each i = 1,2, ... ,p. We show that we can 
construct a test ip2,i on -^2 x Y 2 which satisfies (3.7). Write x 2 — (x 1 , x 2 ) G X 2 
and 2/2 = {y X iV 2 ) G ^2- The test is 

r 1/2 if x 1 or a; 2 G P 5 (-Z;) c , 
^2, i (^2,2/2) = \ ^ if x\x 2 G PaO*i), and 2/ 1 = 2/ 2 , 
[ otherwise 

where Cj G (0, 1) will be defined later. Note that since 2(pf + (1 — Pi) 2 ) > 1, 
ip2,i ■ X 2 x Y 2 -» [0, 1]. By definition / ^2,4(^2, y2)P2{dx 2 dy 2 \9) is 

F(9 T x)P(dx)) 2 +a(f {l-F{9 T x))P(dx)) 2 . (3.8) 

2 -/BjCzi) 
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When a — 1/2, this value is bounded by 



(/ F{9 1 x)P(dx) + / (1- Fie 1 x))P{dx)Y 



which equals to 1/2. If we take \9\ — > oo, by definition of <di, (F(9 T x),l 
F(9 T x)) tends to (1,0) or (0, 1) for x e Bs(zi) and hence (3.8) tends to 



Pi 



Hence if we take a slightly larger than 1/2 >p?/2tobe J ip2,i(x2,y2)P2(dx2dy2\9o) < 
1/2, there exists a compact set K such that (3.7) holds. Hence we can find a 
uniformly consistent test for O against Bs(9 ) n 9^. 

In the last step, we take tp n — max^i^,. ip n ,i where each (ip n ,i]n = 
1,2,...) is uniformly consistent test for 9 against B$(9q) n 0^. Then by con- 
struction (ip n ; n = 1, 2, . . .) is uniformly consistent test for 9q against Bg(9o)- □ 

Now we extend this test for the model (3.1) for c > 3. By Lemma 3.4 it 
is sufficient to construct the test for 6>o against B € ^(9q) for each i = 2, . . . ,p. 
We apply the test constructed in Lemma 3.5 for each i. Let Z = {1,2} and 
Z n = {1, 2}™ and define a map 7Tj : X x Y — > X x Z to be ni(x, y) = (x, l + l(y > 
i)) and 7r nj i : X n x Y n — > X n x Z n to be its obvious generalization. When 
(x,y) ~ P(dxdy\9), the law of (a;, z) — ■Ki[x,y) only depends on a 1 and /? 
defined by 

x - P(dx), P(z = l\a\ /3, x) = l-P(z = 2\a\P, x) = F(a l + fi T x). 

Therefore it is a model (3.1) for c = 2 with explanatory variable (l,x T ) T and 
parameter (a z ,/3 T ) T . Write above model by P(dxdz\a 1 , (3), For the parametric 
family {P(dxdz\a l , 0); a 1 e R, (3 e R p }, by Lemma 3.5, we can construct a 
uniformly consistent test (tpn.i'i n = 1,2, . . .) for (<2q, 0q) against {(a 1 , (3); {\a l — 
a h\ 2 + I/ 3 - /3o| 2 ) 1/2 > e}. Then ip n ,i(x n ,y n ) := ijjn,i(^n,i{xn,yn)) defines a 
uniformly consistent test for 6>o against B ei i(9o) c . Hence ^> n = maxi = 2 i ... jC _i V'n,! 
is uniformly consistent test for 6*o against B € {9q). As a summary we obtain the 
following. 

Proposition 3.6. For the model (3.1) under Assumption 3.2, there exists a 
uniformly consistent test for 8q against B e (9o) c for any 9q € O, e > 0. 

If there exists a uniformly consistent test, the posterior distribution has 
consistency under regularity condition on the prior distribution. Let A(d9) — 
\(8)d9 be a prior distribution where d9 denote the Lebesgue measure. Let 

P n (dx n ,dy n ) = / P n {dx n dy n \9)A(d9). 



Assume the existence of P n (d9\x n , y n ) such that 

P n (d9\x n ,y n )P n (dx n ,y n ) = P n (dx n , dy n \9)A(d9). 
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Write 1(9) for the Fisher information matrix of P(dxdy\9) and write 9 n for the 
central value of P n (d9 \x n , y n ). The following is a consequence of Bernstein- von 
Mises's theorem. Let — u\\ = sup^g^ \fi(A) — v(A)\ be the total variation 
distance between probability measures \i and v on (E,£), 

Corollary 3.7. Assume A is continuous and strictly positive. Under Assump- 
tion 3.2, 

[ \\P n (d9\x n ,y n ) - N{e n , n- x I{6 n ))\\P n {dx n dy n ) -> 0. 

JX n ,Yn 

We will denote H n (x n , y n ,dff) for P n (d6\x n , y n ). We also denote n* (x n , y n , dO) 
for its scaling by 6 — > n x / 2 {6 — 6 n ). By the above corollary, the total variation 
distance between W n (x ni y n ,d9) and iV(0, /(0 n ) _1 ) tends to 0. 

4 Application to Gibbs sampler for cumulative 
link model 

We consider asymptotic properties of the Gibbs sampler for cumulative link 
model. Let (X x Y, X <%> y) be a probability space defined in Section 3 and let 
P(dxdy\9) be a parametric family defined in (3.1). Under the same settings as 
Subsection 3.2, we construct Markov chain Monte Carlo methods on the model 
(3.1) and examine its efficiency. 

4.1 Gibbs sampler and its marginal augmentation 

To construct Gibbs sampler, we introduce a hidden variable z 6 Z = R. 
There are several possibilities for the choice of the structure. We consider two 
choices among them. We refer the former, "null-conditional update" and ll f3 T x- 
conditional update" for the latter: 

{x ~ Px(dx) z ~ f(z)dz y = j if z € (a J_1 + (3 T x, a J + f3 T x] 

x ~ Px(dx) z ~ /(z + (3 T x)dz y = j if z 6 (c* 7 ' -1 , a J ] 

(4.1) 

Above update defines P(dxdydz\9). For example, for /3 T a;-conditional update 

c 

P(dxdydz\9) = '^^Px(dx)f(z + (3 T x)l( a j-i a j](z)dzSj(dy). 
i=i 

For each construction P(dxdy\9) = j zeZ P(dxdydz\9) is equal to the paramet- 
ric family defined in (3.1). As Definition 2.9, we can construct a sequence of 
standard Gibbs sampler M n = (M n ,9) on (X n x Y n ,X n ® J^i, P n (dx n dy n )) on 
(SW, 9) where = Z n >^_e. 

The Gibbs sampler M. n is known to work poorly except c = 2 with /3 T ir- 
conditional update. This phenomena can be explained by our approach. When 
9 n (x n , y n ) is the central value of P n (d9\x ni y n ), a scaling 9 h> n 1/2 (9-9 n (x n , y n )) 
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can be denned. We will show that the sequence of a standard Gibbs sampler 
M n is not locally consistent because the model does not satisfy the regularity 
condition of Theorem 6.4 of [5] except the case c = 2 with /3 T x-conditional 
update. The detail will be discussed later. 

On the other hand, there are some Markov chain Monte Carlo methods which 
works better than above Gibbs sampler. We consider a marginal augmentation 
method introduced by [12] (See also closely related algorithm, parameter expan- 
sion method by [9]). In the method, we introduce a new parameter g € (0, oo) 
with prior A g and write z9 = (9,g) € Q x := 6 x (0, oo) for the new parameter 
set with new prior distribution 

A x (dd) = A(gd9)A g (dg). (4.2) 

The new model with new parameter set is defined by 

f i~ Px{dx) z ~ f(gz)gdz y = j if z € (a-? -1 + /3 T £, op + (3 T x] 

\ x ~ P x (dx) z ~ f(g(z + (3 T x))gdz y = j if z e (a-? -1 , a?] 

(4-3) 

where we refer the former, "null-conditional update with marginal augmenta- 
tion" and "/3 T x-conditional update with marginal augmentation" for the lat- 
ter. Write the above parametric family P(dxdydz\-d). The original model 
P(dxdydz\9) corresponds to P(dxdydz\d = (8,1)). Some important properties 
are summarized as follows: 

1. Its X x Y marginal is written by the original model: for ■&* = (8*,g*), 

{ P{dxdydz\§*) = P{dxdy\d*) = P(dxdy\9 = g*6*) 
Jzez 

where the parametric family in the right hand side is (3.1). 

2. The (/^-marginal of prior and posterior distribution for parameter ex- 
panded model are the same as those without expansion, that is 

/*=(9 >9 ) 6 e* Um^m = A(A), 
h=( S ,g)eex lA(gO)P n (d$\x n ,y n ) = J A P n (d9\x n ,y n ). 

3. The probability distribution P n (dx n dy n ) is well defined in the following 
sense: 

/ P n {dx n dy n \$)A x (dd) = / P n {dx n dy n \d)A{de). 

J0 X J0 

We construct a standard Gibbs sampler (M n , i?) on (X n xY n , X n ®y n , P n (dx n dy n )) 
on (S x , O x ) where S x = Z n x O x . 

We will call {(M„ , g9);n = 1,2,...} (not (M n , •&)) a sequence of standard 
Gibbs sampler with marginal augmentation. 

In our approach, we can show a result summarized in Table 1. 

According to the table, marginal augmentation has better asymptotic prop- 
erties for some cases for c = 2, 3. However, for c > 4, any of Gibbs sampler does 
not have local consistency even with marginal augmentation. 
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Null 


f3 T x 


Null with MA 


(3 T x with MA 


c = 2 


X 





P 





c = 3 


X 


X 


X 





c > 4 


X 


X 


X 


X 



Table 1: Asymptotic properties of Gibbs sampler with and without marginal 
augmentation (MA). The letter means local consistency and X means local 
non-consistency. P means local consistency for p = 1. 



Remark 4.1. There are some other Markov chain Monte Carlo methods which 
improve original Gibbs sampler. For example, parameter expansion methods are 
studied in such as [9] and [3]. These algorithms are closely related to marginal 
argumentation method and it seems to have the same asymptotic properties de- 
scribed above. In [2], Metropolis-within- Gibbs algorithm is considered. It seems 
to have local consistency even for c > 4 although the choice of proposal distri- 
bution is difficult. 

Figure 2 is the simulation results for cumulative probit model for c = 4 
for Gibbs samplers /3 T x-conditional update with/without marginal augmenta- 
tion. These are trajectory of the sequence 0(i) (i = 0, . . . , m — 1) for m = 200 
generated by Gibbs samplers. 



Trajectory of Gibbs samplers 




Figure 2: Trajectory of the Gibbs samplers for sample size n — 1000 for a-2 
(upper) a 3 (middle) and f3 (bottom). Solid line is for without MA and dashed 
lines is for with MA. Horizontal line is the true value. 
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The above figure shows that (a) "without MA" is much worse than "with 
MA" , (b) for both Gibbs samplers, the mixing property for (3 is not so bad and 
(c) "with MA" seems to work well for all parameters. However according to 
Table 1, "with MA" is also locally non-consistent. 

By making a projection 9 H ► o^/o^, we can visualize its local degenerate 
behavior. Figure 3 is the trajectory of 0:2(1) /ot$(i) (i = 0, . . . , m — 1) for m = 
1000. Therefore even if "with MA" seemed to work well, it has the similar 
degenerate behavior with "MA" and the parameter estimation may cause bias. 



Trajectory of Gibbs samplers 











4 ----- M \af- - A; tyt^tfh 4*. A U'Viik " ' 



~1 I I I I I 

200 400 600 800 1000 



Figure 3: Trajectory for 0:3/0:2 ■ Solid line is for without MA and dashed lines 
is for with MA. Horizontal line is the true value. 

In the rest of this section, we prove above results. 
4.2 Approximation of the Gibbs sampler 

We write (M^,i?) for the minimal representation of (M n ,$). Note that the 
minimal representation of (M n ,g0) is (M*,g9). In this subsection, we con- 
struct a formal approximation of ■ The random Markov measure is 
generated by (U* ,K*) where 

f Tl%(x ni y n ,d-d) = P n (dd\x n ,y n ) 

\ K^(x n ,y n ,-d,d"d*) = f Zn P n (dz n \x n> y n ,ti)P n (d'd*\x n ,y n ,z Tl ) 

where $ = (0,g) and d* — (9* 1 g*). Although the parametric family P(dxdydz\"d) 
does not have sufficient regularity described in Theorem 6.4 of [5] , it has a similar 
approximation. 

First we remark an important structure of the current model. The parameter 
$ can be divided into dp and "9 m , where the letter "F" means "(Almost) Fixed" 
parameter and "M" means "unfixed (moving)" parameter. We have 

P n (d§*\ ) = P n (dti* F \ )P n (dd* M \x n ,y ni z n ). 

Depending on the model, dp = 9,{>m = 9 for null-conditional update and 
dp = o,§m = (P,g) for /3 T a;-conditional update. See the following table. We 
write 0^ and 0^ corresponding parameter spaces. 
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Null-conditional 


e 


.9 


/3 T a;-conditional 


a 


(/?,<?) 


Table 2: tip 


and i5 


M 



We prepare notation for the a) Fisher information matrix, b) the central 
value and c) the normalized score function for two models A) {P{dxdy\'ff)]d € 
Q x } and B) {P(dxdydz\d);& M € ©m} for fixed dp. For fixed d e O x , we 
write X = a Y if X — Y tends in P ra (dx ra dy„|?9)-probability to 0. 

a) Write Fisher information matrices by 

m = \lM, F {f) IMP) ) ' 

for models A) and B) with respectively. We write Jm($) = Km{$) — 

b) Write central values by 

for P„(di?|x n , y n ) and P n {dDM \x n , Vm $f) with respectively. Note that 
^M,n(x ni y n ) = a '&M,n{Xn->Vn,^F)- We denote I,J M and K M for Fisher 
information matrices /($), Jm($) arid Km{"&) at 1? = $ n (x n , ?/„). 



c) Write a normalized score function of A) by 

7 ( T |,91 _ ( Z F ,n(Xn,yn\$) 

n[ n,Vn] } ~ \Z M , n (x n ,y n \#) 

See (3.5) for the definition of normalized score function. 

Now we are going to construct an approximation of K x . Since -dp is almost 
fixed parameter 

K^ixn^n.-d.d-d*) ~ K Mn (x n ,y n ,i!),d'd M )S^ F (d-& F ) 

(just a formal sense) where 

K-m n( x ns Vni d"Q M ) = I P n (dz n \x n ,y n ,'i})P n (d'd^ / j\x n ,y n , z n ). 



As an update of i}m, K Mn 1S a transition kenel of a standard Gibbs sampler 
for parametric family B). With a regularity conditions, we can directly apply 
Theorem 6.4 to the model B) which yields normal approximation of 

N(d M ,n(x n ,y n , r dp)+K~ M 1 Jm($M-3m i9 F )),n- 1 ^ 1 +n" 1 ^ 1 J M k- M v ) 
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Here we used (0M,n(%n,yn)>&F) = a {'&M,n{ x n,Vn,'&F),'&F)- We denote K§ n 
for this approximated probability transitoin kernel. We can rewrite $M,n{x n i Vu^f) 
using <& n {x n ,y n ). Under P n (dx n dy n \&), 

•&M,n{Xn,y n ,'&F) = a &M + n~ l / 2 I~^Z M ,n{x n ,y n \&), 
Mx n ,y n ) = a ^ + n- 1 / 2 i- 1 Z n (x n ,y n \^). 

Then by a simple algebra, 

n 1/2 {'&M,n(x n ,y ni '& F ) - -&M,n(Xn,yn)) =" ttf hi \Fn 1/2 ! (l9 F - ®F,n(%n, Vn))- 

This yields an approximation of by normal distribution with mean 

■&M,n(x n , y n ) + K^Jm^M - VM,n(Xn, Vn)) + K^Im^^F ~ $ F,n(x n , y n )) 

with variance nT^KTj +n~ 1 K'^ J^K^ . We denote this normal approximation 

x x 

by K Mn . Hence we obtain approximation K M n (x n , y n , i9, di}* M )S^ F (dflp) of 
K x 

4.3 Asymptotic properties of the Gibbs sampler 

In this subsection, we study asymptotic properties of the Gibbs sampler. It is 
just a validation of the previous subsection. We assume the following. 

Assumption 4.2. 1. A x has the form (4-2) and A(dO) = X(9)d9 and A g (dg) = 
X g (g)dg for Lebesgue measure d9 and dg where X(0), X g (g) are continuous 
and strictly positive. 

2. f has a derivative f which is continuous and 

K:= (\l + zQ^) 2 dze(0,oo). 
J f( z ) 

With the above assumption, we can show that null-conditional update pro- 
duces local degenerate Gibbs sampler for a map 9 — ^ n x / 2 (9 — 9 n (x n ,y n )). For 
probability transition kernels fi(x,dy) and K(x,y,dz), we denote 

(p £§> K)(x, dy, dz) — [a(x, dy)K(x, y, dz). 

We write i9 = (a 2 , . . . ,a c ~ 1 , f3,g) and I?* = {a 2 * , . . . ,a c ~ 1 * , (3* , g*) for ele- 
ments of Q x . We also write (6,g) or (a, /?,<?) for ■& and (9*,g*) or (a*,/3*,g*) 
for i9* with respectively. 

Lemma 4.3. Under Assumptions 3.2 and 4-2, for null- conditional update con- 
struction with marginal augmentation, the following value tends to 0: 



L 



min{v^|0 - l}(n* ® K*)(x n ,y n , d&, d$*)P n (dx n dy n ). (4.4) 
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For (3 T x- conditional update with marginal augmentation, 

mm{Vn|a-a*|,l}(n£ ® K%){x n ,y n: d$,d<d*)P n {dx n dy n ). (4.5) 



tends to 0. 

Proof. We only show the former since proof for the latter is almost the same. 
First we show tightness of y/n(8— 8*). We have y/n(8— 8*) = y/n(8—8 n (x n ,y n ))— 
1/71(8* — 8 n (x n ,y n )) and the both terms in the right hand side have the same 
law II* (x n , y n , •) defined after Corollary 3.7 under <£> K^)(x n ,y n , d-d, d-d*). 
Hence the tightness for y/n(8 — 8*) follows by Corollary 3.7. For any e > 0, fix 
C e to be the probability of the event {y/n\9 — 8*\ > C e } is lower than e in the 
limit. In the following, we only consider under the event {\/n\8 — 8* \ < C e }. 

As the comment before Proposition 3.6, we consider simpler models. It is 
sufficient to show the convergence of \pn\8j — 8* \ for 

8, = {a 3 ,p), 8* = (a 3 *,p*) 

for j = 2, . . . , c — 1. For each j, for = (1, (x t ) T ) T , 

'9* j ) T C<z i if f>j + l 
0]i r C ' c' if y l <3 

since -d* comes from P n (d-8*\x n ,y n , z n ) (see (4.3)). By simple algebra, for each 
fixed £ 0j 



- Ojfto < j - en ~ (0*j ~ 0j) T (£ - Co) if y l >J + l 
-0 3 ) T (o>^-8j^-(8*^8 3 ) T (i-^ Q ) if y*<j 



Assume that £0 is in the support of P% and set r = e/2C e . By the above 
inequality, we have 



e 



2y/n w J ' w 2^ 
where 

S„= max (z* - (0*) T f), T„ = min (** - (0*) T f). 

Now we show that the probabilities of events {S n < — e/2-y/n} and {T n > 
e/2y / n} are negligible. Since the proof is the same, we only show for S n . The 
event is 

n 

{(x n ,y n ,z n );S n < -e/2Vn} = f]{(x n ,y n , z n ); (x\y\z % ) £ E} 

i=l 

where E d X x Y x Z is 

E = {y>j}U{t<£ B r (ta)} U{y<j,£e B r ($ ), (* - WfO < ~e/2Vn}. 
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Note that E c = {£ £ B r (£ Q ),0 > (z - (6*) T £) > -e/2-s/n}. When we write p n 
for the probability of the event E° with respect to P{dxdydz\'d), we have 



l({S n < -e/2VTi})P n (dx n ,dy n dz n \d) = (l-p n ) n . (4.6) 
This value tends to if lim„_ i . 00 np n = +00 and in fact n^l^pn equals to 

JteBrfa) 2 V™ 2 ^(&) 

where the limit is strictly positive. Hence (4.6) tends to for each 1?. Its 
integration by A x also tends to by the bounded convergence theorem. Hence 
y/n{9* — 9j) T £,o tends in probability to 0. 

By showing the convergence y/n(9*—9j) T ^i for i = 1, 2, . . . ,p for span(£o, ■ ■ ■ , Cp) 
supp , the claim of the lemma follows. □ 

For both cases, if {P(dxdydz\d); *&m £ 6f,} for fixed tip £ Op has sufficient 

regularity, then the convergence / \\(K^^ n -K M n )(x n ,y n ,$, ■)\\P n (dx n dy n \'&) -t 
comes from the proof of [5] as described in the end of the previous subsection. 

Let E = / xx T P x (dx), fi = J xP x {dx) and L = J f{z)/f(z){f(z)/f(z)z + 
l)dz. 

Lemma 4.4. For each update, {P(dxdydz\d); , dM € 9f/} for fixed -dp £ Op 
is quadratic mean differentiable having the same support for any $m £ 9f f . 
Moreover there exists a uniformly consistent test. In particular, 



/' 



\\( K M,n - K Mn ){x n ,y n ,d,-)\\P n {dx n dy n \d) 0. 

Proof. For each conditional update, the quadratic mean differentiability of the 
parametric family {P(dxdydz\'d); <&m G @m} f° r fixed £ Op comes from the 
continuity of the corresponding Fisher information matrices: for null-conditional 
update and /3 T x-conditional update, the matrices are 



K M {$) = g- 2 K, K M {#) 



g KH Lfj, 



with respectively. The condition for the support is clear. We show the existence 
of uniformly consistent test. 

For null-conditional update, write 9q for the fixed parameter 'dp. Consider 
a submodel {P{dxdy\9)\ 9 = g9 ,g £ (0, 00)} of original model. Then by Sub- 
section 3.2, this submodel has uniformly consistent test. Now we consider a 
re-parametrization F : 9 — > (9o,9/\9q\). Since F is continuous, re-paramezrized 
model, which is in fact {P(dxdy\d); $ — (9o,g),g £ (0, 00)} has also uniformly 
consistent test. 

The same argument hold for /3 T a;-conditional update. This proves the claim. 

□ 
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We omit the proof of the following proposition since it is similar to that of 
Proposition 4.7. 

Proposition 4.5. The standard Gibbs sampler without marginal augmentation 
is not locally consistent except for 1 x- conditional update for c = 2. 

For the excepted case, local consistency holds. The proof is directly comes 
from Theorem 6.4 of [5]. 

Proposition 4.6. The standard Gibbs sampler without marginal augmentation 
is locally consistent for f3 T x- conditional update for c = 2. 

Proof. In this case, the regularity condition of Theorem 6.4 of [5] is satisfied. 
Hence the claim holds. □ 

Proposition 4.7. The standard Gibbs sampler with marginal augmentation is 
not locally consistent in the following cases: 

1. p > 2 or c > 3 for null- conditional update. 

2. c > 4 for (3 T x- conditional update. 

Proof. For the null-conditional update wtih marginal augmentation, M.\ = 
(Mn,0) is locally degenerate by Lemma 4.3 and Proposition 2.16. Therefore, 
by F{8) = 6/\6\, M 2 = M[ = (M*,d/\6\) is locally degenerate by Lemma 
2.19. On the other hand, if M3 = (M*,gO) is locally consistent, by map- 
ping G(9) = 0/\O\, M 2 = Mf should be locally consistent by Lemma 2.10. 
Since P n {d§\x ni y n ) is not degenerate with the scaling with map F for p > 2 or 
c > 3, it is impossible by Proposition 2.15. Hence 7VI3 = (M„ ,g0) is not locally 
consistent. 

It is quite similar for /3 T x-conditional update. For this case, A4i = (M„ , a) 
is locally degenerate and hence M2 = (M* ,a/\a\) is also locally degenerate by a 
map F(a) — a/\a\. On the other hand, if M3 = (M* ,g6) is locally consistent, 
then M2 = {M* ,ct/\a\) should be locally consistent since M2 = M% for a 
map G(a,P) = a/\a\. Since P„(d , i?|x n , y n ) is not degenerate with the scaling 
with map F for c > 4, it is impossible. Hence M3 = (M*,g0) is not locally 
consistent. □ 

Proposition 4.8. The standard Gibbs sampler with marginal augmentation is 
locally consistent in the following cases: 

1. Null- conditional update for c — 2 and p = 1. 

2. (3 T x- conditional update for c = 2, 3. 

Proof. For null conditional update case, consider M.\ = (M* ,g8) where 9 = f3. 
The probability transition kernel (x n , y n , 0, dd**) of its minimal representa- 
tion is 




P n {dz n I x n , y n , (0, l))P n (dg*d9* \x n , y n ,z n )6 g ,g, (<»**). 
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We show that we can replace 9* by 9 in the above transition kernel. Write 
9n(z n ) for the central value of P n (dg\x n ,y n z n ) = P n (dg\z n ). First we apply 
Bernstein von-Mises's theorem for {P(dxdydz\d); $ = (1, g),g <E (0, oo)} for the 
approximation P n {dg\x n , y n , z n ) ~ N(g n (z n ), n~ 1 K~ 1 ). By this approximation, 
we can approximate by L^(x n , y n , 9, d9*) defined by 

/ P„(dz„|a;„,y„,(^l))0(r*;r5„(z„),n- 1 (r) 2 if- 1 )P„(dr|x Il , 2 /„,z„). 

For some continuous function C, uniformly in g* , 

l^**;^^)^- 1 ^*) 2 ^" 1 )-^**;^^)^- 1 ^" 1 )! <n l ' 2 \e*-e\C{g n {z n )). 

Hence by tightness of g n (z n ) and convergence of ro 1 / 2 ^* —9\ to in probability, 
we can replace 8* of (x n ,y n ,9,d9**) by 6* (see the proof of Theorem 6.4 of 
[5]). Then using Bernstein- von Mises's theorem again, it is validated to replace 
9* in K* in the sense of / \\(K* — K^){x n , y n ,9, •)||P n (dx n d?/„|6 l ) — > where 
is the transition kernel after replacement of 9* by 9. We already have an 
approximation of . Therefore VWi is locally consistent by the convergence 
of total variation by the same argument in the proof of Theorem 6.4 of [5]. 

By the similar argument, for /3 T x-conditional update for c = 2,3, (M„ ,•$) 
or (M* , (j3/a,ga)) are locally consistent with respectively. Therefore (M* ,g9) 
is locally consistent by a map F(-d) = g9 for the former and F((3,g) = (g(3,g) 
for the latter. □ 
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